The misuse of large language models (LLMs) has garnered significant attention from the general public and LLM vendors. In response, efforts have been made to align LLMs with human values and intent use. However, a particular type of adversarial prompts, known as jailbreak prompt, has emerged and continuously evolved to bypass the safeguards and elicit harmful content from LLMs. In this paper, we conduct the first measurement study on jailbreak prompts in the wild, with 6,387 prompts collected from four platforms over six months. Leveraging natural language processing technologies and graph-based community detection methods, we discover unique characteristics of jailbreak prompts and their major attack strategies, such as prompt injection an...
Red-teaming has been a widely adopted way to evaluate the harmfulness of Large Language Models (LLMs...
Most Americans own at least one “smart device.” These include smartphones and video game consoles. D...
With the increase in software vulnerabilities that cause significant economic and social losses, aut...
Large Language Models (LLMs), such as ChatGPT and GPT-4, are designed to provide useful and safe res...
Large language models (LLMs), designed to provide helpful and safe responses, often rely on alignmen...
Large language models (LLMs), such as ChatGPT, have emerged with astonishing capabilities approachin...
Jailbreak vulnerabilities in Large Language Models (LLMs), which exploit meticulously crafted prompt...
Large Language Models (LLMs) continue to advance in their capabilities, yet this progress is accompa...
The past year has seen rapid acceleration in the development of large language models (LLMs). For ma...
Larger language models (LLMs) have taken the world by storm with their massive multi-tasking capabil...
Recently, Large Language Models (LLMs) have made significant advancements and are now widely used ac...
Spurred by the recent rapid increase in the development and distribution of large language models (L...
Large language models (LLMs) are susceptible to red teaming attacks, which can induce LLMs to genera...
Large Language Models (LLMs) are increasingly deployed as the backend for a variety of real-world ap...
Engaging in the deliberate generation of abnormal outputs from large language models (LLMs) by attac...
Red-teaming has been a widely adopted way to evaluate the harmfulness of Large Language Models (LLMs...
Most Americans own at least one “smart device.” These include smartphones and video game consoles. D...
With the increase in software vulnerabilities that cause significant economic and social losses, aut...
Large Language Models (LLMs), such as ChatGPT and GPT-4, are designed to provide useful and safe res...
Large language models (LLMs), designed to provide helpful and safe responses, often rely on alignmen...
Large language models (LLMs), such as ChatGPT, have emerged with astonishing capabilities approachin...
Jailbreak vulnerabilities in Large Language Models (LLMs), which exploit meticulously crafted prompt...
Large Language Models (LLMs) continue to advance in their capabilities, yet this progress is accompa...
The past year has seen rapid acceleration in the development of large language models (LLMs). For ma...
Larger language models (LLMs) have taken the world by storm with their massive multi-tasking capabil...
Recently, Large Language Models (LLMs) have made significant advancements and are now widely used ac...
Spurred by the recent rapid increase in the development and distribution of large language models (L...
Large language models (LLMs) are susceptible to red teaming attacks, which can induce LLMs to genera...
Large Language Models (LLMs) are increasingly deployed as the backend for a variety of real-world ap...
Engaging in the deliberate generation of abnormal outputs from large language models (LLMs) by attac...
Red-teaming has been a widely adopted way to evaluate the harmfulness of Large Language Models (LLMs...
Most Americans own at least one “smart device.” These include smartphones and video game consoles. D...
With the increase in software vulnerabilities that cause significant economic and social losses, aut...